Search CORE

21 research outputs found

Statistical Learning Algorithm for Tree Similarity

Author: Atsuhiro Takasu
Daiji Fukagawa
Tatsuya Akutsu
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

Tree edit distance is one of the most frequently used dis-tance measures for comparing trees. When using the tree edit distance, we need to determine the cost of each oper-ation, but this is a labor-intensive and highly skilled task. This paper proposes an algorithm for learning the costs of tree edit operations from training data consisting of pairs of similar trees. To formalize the cost learning problem, we define a probabilistic model for tree alignment that is a variant of tree edit distance. Then, the parameters of the model are estimated using the expectation maximization (EM) technique. In this paper, we develop an algorithm for parameter learning that is polynomial in time (O(mn2d6)) and space (O(n2d4)) where n, d, and m represent the size of the trees, the maximum degree of trees, and the number of training pairs of trees, respectively. 1

CiteSeerX

Crossref

A clique-based method for the edit distance between unordered trees and its application to analysis of glycan structures

Author: Akutsu Tatsuya
Fukagawa Daiji
Takasu Atsuhiro
Tamura Takeyuki
Tomita Etsuji
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

[Background]Measuring similarities between tree structured data is important for analysis of RNA secondary structures, phylogenetic trees, glycan structures, and vascular trees. The edit distance is one of the most widely used measures for comparison of tree structured data. However, it is known that computation of the edit distance for rooted unordered trees is NP-hard. Furthermore, there is almost no available software tool that can compute the exact edit distance for unordered trees. [Results]In this paper, we present a practical method for computing the edit distance between rooted unordered trees. In this method, the edit distance problem for unordered trees is transformed into the maximum clique problem and then efficient solvers for the maximum clique problem are applied. We applied the proposed method to similar structure search for glycan structures. The result suggests that our proposed method can efficiently compute the edit distance for moderate size unordered trees. It also suggests that the proposed method has the accuracy comparative to those by the edit distance for ordered trees and by an existing method for glycan search. [Conclusions]The proposed method is simple but useful for computation of the edit distance between unordered trees. The object code is available upon request

Crossref

Springer - Publisher Connector

PubMed Central

Kyoto University Research Information Repository

Modeling topical trends over continuous time with priors

Author: Fukagawa Daiji
Masada Tomonari
Oguri Kiyoshi
Shibata Yuichiro
Takasu Atsuhiro
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

In this paper, we propose a new method for topical trend analysis. We model topical trends by per-topic Beta distributions as in Topics over Time (TOT), proposed as an extension of latent Dirichlet allocation (LDA). However, TOT is likely to overfit to timestamp data in extracting latent topics. Therefore, we apply prior distributions to Beta distributions in TOT. Since Beta distribution has no conjugate prior, we devise a trick, where we set one among the two parameters of each per-topic Beta distribution to one based on a Bernoulli trial and apply Gamma distribution as a conjugate prior. Consequently, we can marginalize out the parameters of Beta distributions and thus treat timestamp data in a Bayesian fashion. In the evaluation experiment, we compare our method with LDA and TOT in link detection task on TDT4 dataset. We use word predictive probabilities as term weights and estimate document similarities by using those weights in a TFIDF-like scheme. The results show that our method achieves a moderate fitting to timestamp data.Advances in Neural Networks - ISNN 2010 : 7th International Symposium on Neural Networks, ISNN 2010, Shanghai, China, June 6-9, 2010, Proceedings, Part IIThe original publication is available at www.springerlink.co

Nagasaki University's Academic Output SITE: NAOSITE

Institutional Repositories DataBase (IRDB)

Nagasaki university's Academic Output SITE

Dynamic hyperparameter optimization for bayesian topical trend analysis

Author: Fukagawa Daiji
Hamada Tsuyoshi
Masada Tomonari
Oguri Kiyoshi
Shibata Yuichiro
Takasu Atsuhiro
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2009
Field of study

This paper presents a new Bayesian topical trend analysis. We regard the parameters of topic Dirichlet priors in latent Dirichlet allocation as a function of document timestamps and optimize the parameters by a gradient-based algorithm. Since our method gives similar hyperparameters to the documents having similar timestamps, topic assignment in collapsed Gibbs sampling is affected by timestamp similarities. We compute TFIDF-based document similarities by using a result of collapsed Gibbs sampling and evaluate our proposal by link detection task of Topic Detection and Tracking.Proceeding of the 18th ACM conference : Hong Kong, China, 2009.11.02-2009.11.0

Nagasaki University's Academic Output SITE: NAOSITE

バイオインフォマティクスニオケルコウゾウデータニタイスルリサンサイテキカアルゴリズム

Author: Fukagawa Daiji
Publication venue: 京都大学
Publication date: 23/03/2006
Field of study

京都大学0048新制・課程博士博士(情報学)甲第12437号情博第191号新制||情||43(附属図書館)24273UT51-2006-J428京都大学大学院情報学研究科知能情報学専攻(主査)教授阿久津達也, 教授岡部寿男, 教授永持仁学位規則第4条第1項該当Doctor of InformaticsKyoto UniversityDA

Kyoto University Research Information Repository

Approximating Tree Edit Distance through String Edit Distance

Author: Akutsu Tatsuya
Fukagawa Daiji
Takasu Atsuhiro
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/06/2010
Field of study

We present an algorithm to approximate edit distance between two ordered and rooted trees of bounded degree. In this algorithm, each input tree is transformed into a string by computing the Euler string, where labels of some edges in the input trees are modified so that structures of small subtrees are reflected to the labels. We show that the edit distance between trees is at least 1/6 and at most O(n 3/4) of the edit distance between the transformed strings, where n is the maximum size of two input trees and we assume unit cost edit operations for both trees and strings. The algorithm works in O(n 2) time since computation of edit distance and reconstruction of tree mapping from string alignment takes O(n 2) time though transformation itself can be done in O(n) time

Kyoto University Research Information Repository

Inferring a graph from path frequency

Author: Akutsu Tatsuya
Fukagawa Daiji
Jansson Jesper
Sadakane Kunihiko
Publication venue: 'Elsevier BV'
Publication date: 01/07/2012
Field of study

This paper considers the problem of inferring a graph from the number of occurrences of vertex-labeled paths, which is closely related to the pre-image problem for graphs: to reconstruct a graph from its feature space representation. It is shown that both exact and approximate versions of the problem can be solved in polynomial time in the size of an output graph by using dynamic programming algorithms if the graphs are trees whose maximum degree is bounded by a constant and the lengths of given paths and alphabet size are bounded by constants. On the other hand, it is shown that this problem is strongly NP-hard even for trees of bounded degree if the maximum length of paths is not bounded. The problem of inferring a string from the number of occurrences of fixed size substrings is also studied

Elsevier - Publisher Connector

Kyoto University Research Information Repository

Exact algorithms for computing the tree edit distance between unordered trees

Author: Akutsu Tatsuya
Fukagawa Daiji
Takasu Atsuhiro
Tamura Takeyuki
Publication venue: 'Elsevier BV'
Publication date: 04/02/2011
Field of study

This paper presents a fixed-parameter algorithm for the tree edit distance problem for unordered trees under the unit cost model that works in O(2.62^k⋅poly(n)) time and O(n^2) space, where the parameter k is the maximum bound of the edit distance and n is the maximum size of input trees. This paper also presents polynomial-time algorithms for the case where the maximum degree of the largest common subtree is bounded by a constan

Elsevier - Publisher Connector

Kyoto University Research Information Repository

AI活動による古典研究の可能性

Author: Daiji Fukagawa
深川大路
Publication venue: 同志社大学人文科学研究所
Publication date: 05/12/2022
Field of study

Institutional Repositories DataBase (IRDB)

Approximation and parameterized algorithms for common subtrees and edit distance between unordered trees

Author: Akutsu Tatsuya
Fukagawa Daiji
Halldórsson Magnús M.
Takasu Atsuhiro
Tanaka Keisuke
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

Given two rooted, labeled, unordered trees, the common subtree problem is to find a bijective matching between subsets of nodes of the trees of maximum cardinality which preserves labels and ancestry relationship. The tree edit distance problem is to determine the least cost sequence of insertions, deletions and substitutions that converts a tree into another given tree. Both problems are known to be hard to approximate within some constant factor in general. We tackle these problems from two perspectives: giving exact algorithms, either for special cases or in terms of some parameters; and approximation algorithms and hardness of approximation. We present a parameterized algorithm in terms of the number of branching nodes that solves both problems and yields polynomial algorithms for several special classes of trees. This is complemented with a tighter APX-hardness proof that holds when the trees are of height one and two, respectively. Furthermore, we present the first approximation algorithms for both problems. In particular, for the common subtree problem for t trees, we present an algorithm achieving a tlog2(bOPT+1) ratio, where bOPT is the number of branching nodes in the optimal solution. We also present constant factor approximation algorithms for both problems in the case of bounded height trees

Kyoto University Research Information Repository